2024 VIS Area Curation Committee Executive Summary
Summary
This report summarizes the findings, recommendations, and process by the VIS Area Curation Committee (ACC) regarding the areas and keywords used for paper submissions to IEEE VIS 2024. It is based on the 2021, 2022, and 2023 ACC committee reports, updated with the 2024 data. According to the Charter, the goal of this committee is to analyze and report how submissions made use of the areas and keywords to describe their contribution. It is important to understand when these descriptors no longer adequately cover the breadth of research presented at VIS.
We use submission and bidding information from VIS 2024 to analyze recent trends following the move to an area model.
Given the information, the move appears to be overall successful, however, we see substantial growth in the Application and hence recommend to either split the area to create 7 areas in total, or to allow authors to check a secondary area among the existing ones to help balance the load. Our topic analysis of the last four years of submissions has not revealed an obvious way to cut the Applications areas.
Otherwise, our analysis suggests that submissions are relatively balanced across areas, keywords are (with a small exception) well distributed, and the unified PC appears to provide broad and overlapping coverage.
Regarding authors satisfaction related to the areas from the IEEE VIS Areas Feedback Survey 2024:
With 103 responses, we did get some data – it’s not a particularly high percentage of responses, but it’s a good absolute number.
We received most responses from authors, who submitted to areas 1 (theory…), 2 (applications), and 4 (representations…), and less from areas 3 (systems), 5 (data transformations), and 6 (analytics…).
Very few (6) found that they could not find a suitable area, a fair share of the authors (25 of 103), however, state that they could also have submitted to another area.
8 authors, who submitted to applications, would also have submitted to another area.
Area 6 came up most often as a possible alternative area.
At large, it seems that most authors do not see the immediate need to change the areas.
Quite a few of the authors (18 of 103) thought that area 4 needs an improvement.
Also quite a few (12) thought that area 6 should be improved.
When asked regarding merging/splitting areas, “splitting area 2 (applications)” came up most often.
The authors sent a pretty strong signal that indicating a second-choice area would be welcome.
The full data and source code to rebuild this project are available here.
Committee members 2024: Jean-Daniel Fekete (co-chair), Alexander Lex (co-chair), Helwig Hauser, Ingrid Hotz, David Laidlaw, Torsten Möller, Michael Papka, Danielle Szafir, Yingcai Wu.
Committee members 2023: Steven Drucker (chair), Jean-Daniel Fekete, Ingrid Hotz, David Laidlaw, Alexander Lex, Torsten Möller, Michael Papka, Hendrik Strobelt, Shigeo Takahashi.
Committee members 2022: Steven Drucker (chair), Ingrid Hotz, David Laidlaw, Heike Leitte, Torsten Möller, Carlos Scheidegger, Hendrik Strobelt, Shigeo Takahashi, Penny Rheingans.
Committee members 2021: Alex Endert (chair), Steven Drucker (next chair), Issei Fujishiro, Christoph Garth, Heidi Lam, Heike Leitte, Carlos Scheidegger, Hendrik Strobelt, Penny Rheingans.
Last edited: 2024-08-20.
Code
import itertoolsimport pandas as pdimport numpy as np# Import the necessaries librariesimport plotly.offline as pioimport plotly.graph_objs as goimport plotly.express as px# [jdf] no need to specify the renderer but, for interactive use, init_notebook should be called# pio.renderers.default = "jupyterlab"# Set notebook mode to work in offline# pio.init_notebook_mode()# pio.init_notebook_mode(connected=True)width =750import sqlite3#### Data Preparation# static data – codes -> names etc.staticdata =dict( decision = { 'C': 'Confer vs. cond Accept', # relevant for the 2020 and 2021 data have a different meaning'A': 'Accept', # for the 2020 data'A2': 'Accept', # after the second round, should be 120 in 2022'R': 'Reject', # reject after the first round -- should be 322 in 2022'R2': 'Reject in round 2', # reject after the second round -- should be 2 in 2022'R-2nd': 'Reject in round 2', 'DR-S': 'Desk Reject (Scope)', # should be 7 in 2022'DR-P': 'Desk Reject (Plagiarism)', # should be 4 in 2022'AR-P': 'Admin Reject (Plagiarism)', # should be 1 in 2022'DR-F': 'Desk Reject (Format)', # should be 4 in 2022'R-Strong': 'Reject Strong', # cannot resubmit to TVCG for a year'T': 'Reject TVCG fasttrack', # Explicitly invited to resubmit to TVCG, status in major revision }, FinalDecision = { # Just flatten to Accept and Reject'C': 'Accept', 'A': 'Accept', # for the 2020 data'A2': 'Accept', # after the second round, should be 120 in 2022'R': 'Reject', # reject after the first round -- should be 322 in 2022'R2': 'Reject', # reject after the second round -- should be 2 in 2022'R-2nd': 'Reject', 'DR-S': 'Reject', # should be 7 in 2022'DR-P': 'Reject', # should be 4 in 2022'AR-P': 'Reject', # should be 1 in 2022'DR-F': 'Reject', # should be 4 in 2022'R-Strong': 'Reject','T': 'Reject', }, area = {'T&E': '(1) Theoretical & Empirical','App': '(2) Applications','S&R': '(3) Systems & Rendering','R&I': '(4) Representations & Interaction','DTr': '(5) Data Transformations','A&D': '(6) Analytics & Decisions', }, bid = { 0: 'no bid',1: 'want',2: 'willing',3: 'reluctant',4: 'conflict' }, stat = {'Prim': 'Primary', 'Seco': 'Secondary' }, keywords = pd.read_csv("../data/2021/keywords.csv", sep=';'), # 2021 is correct as there was no new keywords file in 2022 colnames = {'confsubid': 'Paper ID','rid': 'Reviewer','decision': 'Decision','area': 'Area','stat': 'Role','bid': 'Bid' })dbcon = sqlite3.connect('../data/vis-area-chair.db') #[jdf] assume data is in ..submissions_raw20 = pd.read_sql_query('SELECT * from submissions WHERE year = 2020', dbcon, 'sid')submissions_raw21 = pd.read_sql_query('SELECT * from submissions WHERE year = 2021', dbcon, 'sid')submissions_raw22 = pd.read_sql_query('SELECT * from submissions WHERE year = 2022', dbcon, 'sid')submissions_raw23 = pd.read_sql_query('SELECT * from submissions WHERE year = 2023', dbcon, 'sid')submissions_raw24 = pd.read_sql_query('SELECT * from submissions WHERE year = 2024', dbcon, 'sid')submissions_raw = pd.read_sql_query('SELECT * from submissions', dbcon, 'sid')#print(submissions_raw24)submissions = (submissions_raw .join( pd.read_sql_query('SELECT * from areas', dbcon, 'aid'), on='aid' ) .assign(Keywords =lambda df: (pd .read_sql_query('SELECT * FROM submissionkeywords', dbcon, 'sid') .loc[df.index] .join( pd.read_sql_query('SELECT * FROM keywords', dbcon, 'kid'), on='kid' ) .keyword .groupby('sid') .apply(list) )) .assign(**{'# Keywords': lambda df: df.Keywords.apply(len)}) .assign(**{'FinalDecision': lambda df: df['decision']}) .replace(staticdata) .rename(columns = staticdata['colnames']) .drop(columns = ['legacy', 'aid'])# .set_index('sid')# .set_index('Paper ID')# note -- I changed the index, since 'Paper ID' was not unique for multiple years.# By not setting the index to 'Paper ID' the index remains with 'sid'.# However, 'sid' is used as a unique index in the creation of the database anyways.)# replace the old 'Paper ID' with a unique identifier, so that the code from 2021 will worksubmissions = submissions.rename(columns = {'Paper ID':'Old Paper ID'})submissions.reset_index(inplace=True)submissions['Paper ID'] = submissions['sid']submissions = submissions.set_index('Paper ID')#submissions colums: (index), sid (unique id), Paper ID (unique), Old Paper ID, Decision, year, Area, Keywords (as a list), # Keywordsall_years = submissions['year'].unique()#rates_decision computes the acceptance rates (and total number of papers) per year#rates_decision: (index), Decision, year, count, Percentagerates_decision = (submissions .value_counts(['Decision', 'year']) .reset_index()# .rename(columns = {0: 'count'}))rates_decision['Percentage'] = rates_decision.groupby(['year'])['count'].transform(lambda x: x/x.sum()*100)rates_decision = rates_decision.round({'Percentage': 1})#rates_decision computes the acceptance rates (and total number of papers) per year#rates_decision: (index), Decision, year, count, Percentagerates_decision_final = (submissions .value_counts(['FinalDecision', 'year']) .reset_index()# .rename(columns = {0: 'count'}))rates_decision_final['Percentage'] = rates_decision_final.groupby(['year'])['count'].transform(lambda x: x/x.sum()*100)rates_decision_final = rates_decision_final.round({'Percentage': 1})#submissions#bids_raw: (index), Reviewer ID, sid (unique paper identifier over mult years), match score, bid of the reviewer, role of the reviewer, Paper IDbids_raw = (pd .read_sql_query('SELECT * from reviewerbids', dbcon) .merge(submissions_raw['confsubid'], on='sid') .replace(staticdata) .rename(columns = staticdata['colnames']))#bids_raw## Renaming Paper ID to Old Paper ID, setting Paper ID to sid, keeping all 3 for now...bids_raw = bids_raw.rename(columns = {'Paper ID':'Old Paper ID'})bids_raw['Paper ID'] = bids_raw['sid']# bids = Reviewer, sid, Bid (how the reviewer bid on this paper)# doesn't include review/sid that were not bid for [.query('Bid != "no bid"')]bids = (bids_raw .query('Bid != "no bid"')# Paper ID is not unique over multiple years!# .drop(columns = ['sid'])# [['Reviewer','Paper ID', 'Bid']] [['Reviewer','sid', 'Paper ID', 'Bid']] .reset_index(drop =True))# matchscores becomes a table to reviewer/sid with the match scores# many of these will be "NaN" since we now have multiple years together.# we need to check whether the reviewer IDs remain unique across the years!matchscores = (bids_raw# Paper ID is not unique over multiple years!# [['Reviewer','Paper ID','match']] [['Reviewer','sid','Paper ID','match']]# Paper ID is not unique over multiple years!# .set_index(['Reviewer', 'Paper ID']) .set_index(['Reviewer', 'Paper ID']) .match .unstack(level=1))# assignments = Reviewer, sid, Role (primary, secondary)# doesn't include review/sid that were not assigned [.query('Role != ""')]assignments = (bids_raw .query('Role != ""')# Paper ID is not unique over multiple years!# [['Reviewer', 'Paper ID', 'Role']] [['Reviewer', 'sid', 'Paper ID', 'Role']] .reset_index(drop =True))del dbcon#### Plot Defaultsacc_template = go.layout.Template()acc_template.layout =dict( font =dict( family='Fira Sans', color ='black', size =13 ), title_font_size =14, plot_bgcolor ='rgba(255,255,255,0)', paper_bgcolor ='rgba(255,255,255,0)', margin =dict(pad=10), xaxis =dict( title =dict( font =dict( family='Fira Sans Medium', size=13 ), standoff =10 ), gridcolor='lightgray', gridwidth=1, automargin =True, fixedrange =True, ), yaxis =dict( title =dict( font =dict( family='Fira Sans Medium', size=13 ), standoff =10, ), gridcolor='lightgray', gridwidth=1, automargin =True, fixedrange =True, ), legend=dict( title_font_family="Fira Sans Medium", ), colorway = px.colors.qualitative.T10, hovermode ='closest', hoverlabel=dict( bgcolor="white", bordercolor='lightgray', font_color ='black', font_family ='Fira Sans' ),)acc_template.data.bar = [dict( textposition ='inside', insidetextanchor='middle', textfont_size =12,)]px.defaults.template = acc_templatepx.defaults.category_orders = {'Decision': list(staticdata['decision'].values()),'FinalDecision': list(staticdata['FinalDecision'].values()),'Area': list(staticdata['area'].values()),'Short Name': staticdata['keywords']['Short Name'].tolist(),}config =dict( displayModeBar =False, scrollZoom =False, responsive =False)def aspect(ratio):return { 'width': width, 'height': int(ratio*width) }# useful data sub-products#k_all columns: (index), Paper ID, Old Paper ID, Decision, year, Area, Keywords (as a list), # Keywords, Keyword, Category, Subcategory, Short Name, Descriptionk_all = (submissions .join(submissions['Keywords'] .explode() .rename('Keyword') ) .reset_index(level =0) .merge(staticdata['keywords'], on='Keyword'))# (Old) Paper ID is not unique, however, the 'sid' is (which is the current index)#k_all.reset_index(inplace=True)#k_all.rename(columns = {'sid':'Paper ID'},inplace = True)#k_all = k_all.merge(staticdata['keywords'], on='Keyword')#k_all#k_total columns: Category, Subcategory, Short Name, Keyword, Description, #Submissions, year# counts the total number of submissions per keyword and yeark_total = staticdata['keywords'].merge( k_all.value_counts(['Short Name','year']) .rename('# Submissions') .reset_index(),# on = 'Short Name', how ='right'# how = 'outer')#k_cnt: how often was a particular keyword used among all submissions within a year????#k_cnt columns: (index), Short Name, year, c, Category, Subcategory, Keyword, Description# not clear how k_cnt and k_total differ!k_cnt = (k_all .value_counts(['Short Name','year'], sort=False) .rename('c') .to_frame() .reset_index() .merge(staticdata['keywords'], on='Short Name'))
Highlights
Some highlights of the data to support our current recommendations:
Submissions
The number of submissions peaked in 2020 at 585 papers, which is likely caused by the pandemic and the one-month extension to the dealine given because of it. The years 2021 and 2022 saw lower numbers of submissions with 442 and 460 respectively. Submissions increased in the year 2023 (539) and 2024 (544), and are now almost back to the peak of 2020.
fig = px.bar(totals, y='year', x='count', orientation ='h', labels={'count':'Number of Submissions', 'year':'Year'}, text ='count',).update_layout( yaxis=dict(autorange="reversed", tickmode='linear'), title ='Submissions Numbers since 2020', xaxis_title ='Number of Submissions',**aspect(0.35))fig.show(config=config)
Figure 1: Submissions since 2020
Acceptance Rates
Acceptance rates have fluctuated lightly from 2020-2023 (26.8%, 24.9%, 26.1% and 25.8%) though there was a dip (24.4%) in 2021. For 2024, we see a rather sharp drop off to 22.4%, which is partially caused by a lower first-round acceptance rate (23.2%) and amplified by 3 second-round rejects.
This trend might mean that the reviewers want higher-quality articles, which would be good if the research field was becoming more stable and reached a steady state, or that they become more conservative, which would be detrimental to the development of novel less consensual research directions. We hope the VSC will enquire more deeply and provide guidelines to the OPC and reviewers to steer the conference in the right direction.
Comment Alex: here it would be good to get input from the OPCs. I don’t know whether we should sepculate on reasons in this report.
Code
fig = px.bar(rates_decision_final, x ='Percentage', y ='year', barmode ='stack', orientation ='h', color ='FinalDecision', text ='Percentage', custom_data = ['FinalDecision','count'],).update_layout( yaxis=dict(autorange="reversed", tickmode='linear'), title ='Acceptance Rates since 2020', xaxis_title ='Percentage of Submissions',**aspect(0.35)).update_traces( hovertemplate ='%{customdata[1]} submissions in %{y} have decision %{customdata[0]}<extra></extra>',).show(config=config)
Submissions across the (reformulated) areas are relatively stable between 2021 and 2024 with some notable exceptions. Applications has been a large area since the start of the area model (100 submissions in 2021), but has seen growth in 2023 (123 submissions) and especially in 2024 (154). Applications is hence three times as large as the smaller areas Data Transformations and Systems & Rendering, indicating an uneven load for the area paper chairs.
The other larger areas, Theoretical & Empirical has seen a slight dip in 2024; while Representations & Interaction has seen a rise from 86 submissions to 108 papers. However, these numbers remain in the desired range of submissions handled by a team of APCs.
Acceptance Rates in Areas
Code
recent_submissions = submissions[submissions['year'] !=2020]tmptotal = (recent_submissions .value_counts(['Area', 'year']) .reset_index() .rename(columns = {'count': 'total'}))tmp = (recent_submissions .value_counts(['Area', 'FinalDecision', 'year']) .reset_index()# .rename(columns = {0: 'count'}))tmpfinal = pd.merge(left=tmp, right=tmptotal, on=['Area','year'])tmpfinal['percentage']=round(tmpfinal['count']/tmpfinal['total'] *1000)/10.0fig = px.bar(tmpfinal, x ='year', y ='percentage', barmode ='stack', orientation ='v', color ='FinalDecision', text ='percentage', custom_data = ['FinalDecision'], facet_col='Area', category_orders = {"year": [2021,2022, 2023, 2024]}, facet_col_spacing=0.06, # default is 0.03 ).update_layout( title ='Submissions by area and year', xaxis_title ='year', legend=dict( yanchor="top", y=1, # Adjust legends y-position xanchor="left", x=1.08, # ... and x-position to avoid overlapping ),**aspect(0.8) ).update_xaxes(type='category').update_traces( hovertemplate ='%{y}% of submissions in %{x} have decision %{customdata[0]}<extra></extra>', )fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))for i,a inenumerate(fig.layout.annotations):if (i%2): a.update(yshift=-15)# Add horizontal line at 75% for each subplotfig.add_shape(type="line", x0=0, x1=1, # from the left to the right of the plot y0=75, y1=75, # at y = 75% on the y-axis xref='paper', # relative to the entire plot width yref='y', # relative to the y-axis line=dict(color="Darkgray", width=2),)# Add a label next to the line at 75%fig.add_annotation( x=1, # Position near the end of the plot (right side) y=75, # Position at 75% on the y-axis xref='paper', # Relative to the entire plot width yref='y', # Relative to the y-axis text="75% Threshold", # The label text showarrow=False, # No arrow, just text font=dict(size=12, color="Black"), # Customize the font size and color xanchor='left', # Anchor the text to the left side of the x-position yanchor='middle'# Center the text vertically on the y-position)fig.show(config=config)
Figure 4: Acceptance Rate per Area since 2021
Acceptance rates have been fairly consistent across areas in 2021, but not so in 2022, 2023, and 2024.
Generally, Theoretical & Empirical seems to have higher acceptance rates than other areas. Analytics & Decisions seems to become substantially more selective every year, accepting only 16.9% of all submissions in 2024. Systems & Rendering fluctuates over time, with 33.3% accepted in 2023 but only 16.7% in 2024. It is notable that Systems & Rendering is one of the smallest areas, hence, these fluctiations are caused by a relatively small number of papers.
Keywords
And frequencies of the use of keywords range from 5 to 120. The keywords with the highest number of occurrences are not very useful for categorizing papers, but they are very meaningful, and differentiation works effectively with accompanying keywords. We believe that having five papers that use a keyword is sufficient to warrant retaining it.
Code
# do a manual histogram to include non-specified keywords# k_total['Submission %'] = k_total.groupby(['year'])['# Submissions'].transform(lambda x: x/x.sum()*100)k_total['Year'] = k_total['year'].astype(str) # to get categorical colorsk_year = k_total.pivot(index="year", values="Submission %", columns="Short Name").Tpx.scatter(k_total, y ='Short Name', x ='Submission %', # 'Submission %', color ='Year', category_orders={"Year": ["2024", "2023", "2022", "2021", "2020"]}# facet_row='year',# category_orders={'year': reversed([2020, 2021, 2022, 2023, 2024])},).update_traces( hovertemplate ="'%{x}' specified in %{y} submissions<extra></extra>",).update_layout( yaxis_tickfont_size =8, yaxis_dtick =1, yaxis_tickmode ='linear',# yaxis_dtick = 50, hovermode ='closest', title ='Frequency of keywords across submissions',**aspect(1)).show(config=config)
Figure 5: Frequency of Keywords per year
All the areas showed growth from 2022 to 2024 except for “Analytics & Decisions”, though that area showed significant growth in 2022. The areas for “Applications” and “Representation & Interaction” have grown significantly. “Applications” has surpassed 120 submissions in 2024 which might eventually arguing for action. More to write?
Code
recent_submissions = submissions[submissions['year'] !=2020]tmptotal = (recent_submissions .value_counts(['Area', 'year']) .reset_index() .rename(columns = {'count': 'total'}))tmp = (recent_submissions .value_counts(['Area', 'FinalDecision', 'year']) .reset_index()# .rename(columns = {0: 'count'}) #[jdf] no need to rename, the count is already in the 'count' attribute.)tmpfinal = pd.merge(left=tmp, right=tmptotal, on=['Area','year'])tmpfinal['percentage']=round(tmpfinal['count']/tmpfinal['total'] *1000)/10.0tmpfinal['yearcat'] = tmpfinal['year'].astype('category')fig = px.bar(tmpfinal, x ='year', y ='count', barmode ='stack', orientation ='v', color ='yearcat', text ='count', custom_data = ['FinalDecision'], facet_col='Area', category_orders = {"year": [2021,2022, 2023, 2024]}, facet_col_spacing=0.06, # default is 0.03 ).update_layout( title ='Submissions by area and year', xaxis_title ='year',**aspect(0.8) ).update_xaxes(type='category').update_traces( hovertemplate ='%{x} submissions in %{y} have decision %{customdata[0]}<extra></extra>', )fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))for i,a inenumerate(fig.layout.annotations):if (i%2): a.update(yshift=-15)fig.show(config=config)
Figure 6: Submissions by Area and Year
Deeper data investigation
This report is generated by members of the ACC for the current year, and prepared for the VSC. Upon review, it will be linked from the IEEE VIS website. The conclusions and discussion points are based on submission and reviewer data from IEEE VIS 2024 (and previous years). The report and analysis performed is focused on the use of keywords, areas, and reviewer matching. Thus, there are likely other aspects of conference organization which are not covered (but could be considered).
The report is broken down into the following sections. After the summary at the beginning, the data and analysis process is described. It shows which data we used, where it is stored, and how it is obtained. These processes can be adapted for future years of this committee.
(NB: Some of the plots shown above are repeated here from the highlights for the sake of completeness.)
Data and Process
We analyzed anonymized data containing information about the full paper submissions to VIS 2024, the reviews of these submissions, and the IPC bidding preferences. We analyzed this data to understand how well the areas and keywords characterize the body of work submitted this year. We also analyzed the IPC bidding information to understand how well the expertise of the IPC members covers the submissions. Below, we show highlights of our findings.
Note that in the the analysis that follows, the submission/paper IDs and reviewer IDs are anonymized through a randomizer, and are not the IDs used in PCS submissions and reviewers.
The data used to perform this analysis is a combination of paper submission data and reviewer bidding data. Both sets were anonymized to minimize the ability to identify IPC members, authors, or reviewers. The analysis of the data in this year uses the anonymized CSV files obtained directly from PCS. You can see the source code used to process and generate the plots in this document by clicking on the “Code” buttons, which will fold out the Python code used. The anonymization script that was used is located in the anonymization-scripts folder (and may be needed to be updated to correspond with changes made in PCS). In order to get ALL the data, it is current run by James at PCS who sends the resultant anonymized files to the ocmmittee where they are stored in the corresponding year folder.
In order to facilitate longitudinal studies of this data, we are also providing a sqlite database with the 2020, 2021, and 2022 data in an attempt to make it easier to incorporate future years. This database (as well as the source code of this document) can be found here
Sanity Checks
We include some sanity checks on the data in order to make sure the data has been processed correctly. In 2024, we should have:
139 papers accepted after the second round
236 papers rejected after the first round
9 papers desk rejected
Code
#rates_decision computes the acceptance rates (and total number of papers) per year#rates_decision: (index), Decision, year, count, Percentagerates_decision = (submissions .value_counts(['Decision', 'year']) .reset_index()# .rename(columns = {0: 'count'}) #[jdf] no need to rename, the count is already in the 'count' attribute.)rates_decision['Percentage'] = rates_decision.groupby(['year'])['count'].transform(lambda x: x/x.sum()*100)rates_decision = rates_decision.round({'Percentage': 1})#rates_decision computes the acceptance rates (and total number of papers) per year#rates_decision: (index), Decision, year, count, Percentagerates_decision_final = (submissions .value_counts(['FinalDecision', 'year']) .reset_index()# .rename(columns = {0: 'count'}) #[jdf] no need to rename, the count is already in the 'count' attribute.)rates_decision_final['Percentage'] = rates_decision_final.groupby(['year'])['count'].transform(lambda x: x/x.sum()*100)rates_decision_final = rates_decision_final.round({'Percentage': 1})#| output: truerates_decision_final.sort_values(by=['year', 'FinalDecision'], ascending=[False, True], ignore_index=True)#| label: fig-percent-submission-decision-per-year#| fig-cap: Percentage of decisions for the submissions per yearfig = px.bar(rates_decision, x ='count', y ='year', barmode ='stack', orientation ='h', color ='Decision', text ='count', custom_data = ['Decision'],).update_layout( yaxis=dict(autorange="reversed"), title ='Submissions', xaxis_title ='Number of Submissions',**aspect(0.45)).update_traces( hovertemplate ='%{x} submissions in %{y} have decision %{customdata[0]}<extra></extra>',).show(config=config)fig = px.bar(rates_decision, x ='Percentage', y ='year', barmode ='stack', orientation ='h', color ='Decision', text ='Percentage', custom_data = ['Decision','count'],).update_layout( yaxis=dict(autorange="reversed"), title ='Submissions', xaxis_title ='Percentage of Submissions',**aspect(0.45)).update_traces( hovertemplate ='%{customdata[1]} submissions in %{y} have decision %{customdata[0]}<extra></extra>',).show(config=config)
(a) Number
(b) Percentage
Figure 7: Decisions for the submissions per year
The wide ranges of decisions can be collapsed into more straightforward Accept or Reject (where Reject includes desk rejects, admin rejects, and rejections in round 1 or 2). The acceptance rate dropping off is a visible trend.
Code
fig = px.bar(rates_decision_final, x ='count', y ='year', barmode ='stack', orientation ='h', color ='FinalDecision', text ='count', custom_data = ['FinalDecision'],).update_layout( yaxis=dict(autorange="reversed"), title ='Submissions', xaxis_title ='Number of Submissions',**aspect(0.45)).update_traces( hovertemplate ='%{x} submissions in %{y} have decision %{customdata[0]}<extra></extra>',).show(config=config)fig = px.bar(rates_decision_final, x ='Percentage', y ='year', barmode ='stack', orientation ='h', color ='FinalDecision', text ='Percentage', custom_data = ['FinalDecision','count'],).update_layout( yaxis=dict(autorange="reversed"), title ='Submissions', xaxis_title ='Percentage of Submissions',**aspect(0.35)).update_traces( hovertemplate ='%{customdata[1]} submissions in %{y} have decision %{customdata[0]}<extra></extra>',).show(config=config)
(a) Number
(b) Percentage
Figure 8: Acceptance per year
Submissions per Area.
We wanted to understand how submissions were distributed by area, including acceptance decisions. Submissions to each area were within reasonable upper and lower limits, and decisions did not appear partial to any individual area.
Code
def group_stat(g):return pd.DataFrame({'# Submissions': g,'% Submissions': round(g/g.sum()*100,1),'Total': g.sum() })tmp = (submissions[submissions.year >2020] .value_counts(['Area', 'Decision', 'year']) .reset_index() .rename(columns = {0: 'count'}))fig = px.bar(tmp, x ='count', y ='Area', barmode ='stack', orientation ='h', color ='Decision', text ='count', custom_data = ['Decision'], facet_row='year', category_orders={'year': [2024,2023,2022, 2021]}, #, 2020]},).update_layout( title ='Submissions by area and year', xaxis_title ='Number of Submissions', yaxis=dict( autorange="reversed", tickfont=dict(size=12), # Adjust y-label fontsize ),**aspect(1.3)).update_traces( hovertemplate ='%{x} submissions in %{y} have decision %{customdata[0]}<extra></extra>', texttemplate='%{text}', textangle=0# Force labels to have horizontal orientation).show(config=config)fig = px.bar(tmp, x ='count', y ='Area', barmode ='stack', orientation ='h', color ='Decision', text ='count', custom_data = ['Decision'],).update_layout( title ='Submissions by area all years', xaxis_title ='Number of Submissions all years', yaxis=dict( autorange="reversed", tickfont=dict(size=12), # Adjust y-label fontsize ),**aspect(0.5)).update_traces( hovertemplate ='%{x} submissions in %{y} have decision %{customdata[0]}<extra></extra>',).show(config=config)data=[]count=0for my_year in all_years: count=count+1 trace1=go.Bar( x=tmp[tmp['year']==my_year]["Area"], y=tmp[tmp['year']==my_year]['count'], customdata = tmp[tmp['year']==my_year]['Decision'], hovertemplate="%{y} papers were %{customdata[0]} in", name=f"{my_year}", offsetgroup=count, ) data.append(trace1)fig2 = go.Figure( data=data, layout=go.Layout( title="Comparing # submissions 2021, 2022, 2023, and 2024", xaxis_title="Areas" ))fig2.show()
Code
recent_submissions = submissions[submissions['year'] !=2020]tmptotal = (recent_submissions .value_counts(['Area', 'year']) .reset_index() .rename(columns = {'count': 'total'}))tmp = (recent_submissions .value_counts(['Area', 'FinalDecision', 'year']) .reset_index()# .rename(columns = {0: 'count'}))tmpfinal = pd.merge(left=tmp, right=tmptotal, on=['Area','year']).sort_values("year")tmpfinal['percentage']=round(tmpfinal['count']/tmpfinal['total'] *1000)/10.0fig = px.bar(tmpfinal, x ='percentage', y ='Area', barmode ='stack', orientation ='h', color ='FinalDecision', text ='percentage', custom_data = ['FinalDecision'], facet_row='year',).update_layout( yaxis=dict(autorange="reversed"), title ='Submissions by area and year', xaxis_title ='Number of Submissions',**aspect(0.8)).update_traces( hovertemplate ='%{x} submissions in %{y} have decision %{customdata[0]}<extra></extra>', texttemplate='%{text}', textangle=0# Force labels to have horizontal orientation).show(config=config)tmpfinal2 = tmpfinal.groupby(['Area','FinalDecision']).sum().reset_index()tmpfinal2['newpercentage'] =round(tmpfinal['count']/tmpfinal['total']*1000)/10fig = px.bar(tmpfinal2, x ='newpercentage', y ='Area', barmode ='stack', orientation ='h', color ='FinalDecision', text ='newpercentage', custom_data = ['FinalDecision'],).update_layout( yaxis=dict(autorange="reversed"), title ='Submissions by area for all recent years', xaxis_title ='Number of Submissions for all recent years',**aspect(0.35)).update_traces( hovertemplate ='%{x} submissions in %{y} have decision %{customdata[0]}<extra></extra>', texttemplate='%{text}', textangle=0# Force labels to have horizontal orientation).show(config=config)data=[]count=0recent_years = all_years[all_years !=2020]for my_year in recent_years: count=count+1 trace1=go.Bar( x=tmp[tmp['year']==my_year]["Area"], y=tmp[tmp['year']==my_year]['count'], customdata = tmp[tmp['year']==my_year]['FinalDecision'], hovertemplate="%{y} papers were %{customdata} in", name=f"{my_year}", offsetgroup=count, ) data.append(trace1)fig2 = go.Figure( data=data, layout=go.Layout( title="Comparing # submissions 2021, 2022, 2023, and 2024", xaxis_title="Areas" ))fig2.show()
Submissions and Keywords used
We also analyzed how often keywords were used in the submissions. The frequency of keywords used is reasonable. The keywords at the tail of the distibution may require actions.
How many papers were submitted to each area, and what is the breakdown of decisions? Anything else to add?
Code
# do a manual histogram to include non-specified keywordspx.scatter(k_total, x ='Short Name', y ='Submission %', # 'Submission %', color ='Year', category_orders={"Year": ["2024", "2023", "2022", "2021", "2020"]}# facet_row='year',# category_orders={'year': reversed([2020, 2021, 2022, 2023, 2024])},).update_traces( hovertemplate ="'%{x}' specified in %{y} submissions<extra></extra>",).update_layout( xaxis_tickfont_size =8, xaxis_dtick =1,# yaxis_dtick = 50, hovermode ='closest', title ='Frequency of keywords across submissions',**aspect(0.8)).show(config=config)
Keywords with strong variations between 2023 and 2024 are listed here, with their historical differences. Despite the variations, there is no strong trend over the four years. Mostly yearly variations.
# do a manual histogram to include non-specified keywordspx.bar(k_total, x ='Short Name', y ='# Submissions', color ='Category', facet_row='year', category_orders={'year': reversed([2020, 2021, 2022, 2023, 2024])},).update_traces( hovertemplate ="'%{x}' specified in %{y} submissions<extra></extra>",).update_layout( xaxis_tickfont_size =8, xaxis_dtick =1, yaxis_dtick =50, hovermode ='closest', title ='Frequency of keywords across submissions',**aspect(0.8)).show(config=config)
Code
k_cnt = staticdata['keywords'].merge( pd.DataFrame(staticdata['area'].values(), columns = ['Area']), how ='cross').merge( k_all .value_counts(['Short Name', 'Area']) .groupby(level=0) .apply(group_stat) .droplevel(level=0) #[jdf] Needed to avoid duplicate 'Short Name' column .reset_index(), how ='outer').fillna(1e-10) # needed for sorting, Plotly bug?# do manual histogram without 2020 areask_cnt_new=k_cnt[~k_cnt.Area.isin(['VAST', 'SciVis', 'InfoVis'])]# with 2020 absolutepx.bar(k_cnt, x ='Short Name', y ='# Submissions', color ='Area', custom_data = ['Area']).update_traces( hovertemplate ='Keyword "%{x}" specified by %{y} submissions from area "%{customdata}"<extra></extra>').update_layout( barmode ='stack', xaxis_dtick =1, xaxis_tickfont_size =8, xaxis_fixedrange =True, yaxis_fixedrange =True, xaxis_categoryorder ='total descending', title ='Frequency of keywords across submissions, by area (for all years)',**aspect(0.5),).show(config=config)k_cnt['Submissions_pct'] = k_cnt.groupby(['Short Name'])['# Submissions'].transform(lambda x: x/x.sum()*100)k_cnt = k_cnt.round({'Percentage': 1})# make sure order is consistent over absolute & percentage plotk_cnt['Totals']=k_cnt.groupby(['Short Name'])['# Submissions'].transform(lambda x: x.sum())k_cnt=k_cnt.sort_values('Totals', ascending=False)# with 2020 in percentpx.bar(k_cnt, x ='Short Name', y ='Submissions_pct', color ='Area', custom_data = ['Area']).update_traces( hovertemplate ='Keyword "%{x}" specified by %{y} submissions from area "%{customdata}"<extra></extra>').update_layout( barmode ='stack', xaxis_dtick =1, xaxis_tickfont_size =8, xaxis_fixedrange =True, yaxis_fixedrange =True, xaxis_categoryorder ='trace', title ='Frequency of keywords across submissions, by area (for all years)', yaxis_title ='% of Submissions',**aspect(0.5)).show(config=config)# without 2020 absolutepx.bar(k_cnt_new, x ='Short Name', y ='# Submissions', color ='Area', custom_data = ['Area']).update_traces( hovertemplate ='Keyword "%{x}" specified by %{y} submissions from area "%{customdata}"<extra></extra>').update_layout( barmode ='stack', xaxis_dtick =1, xaxis_tickfont_size =8, xaxis_fixedrange =True, yaxis_fixedrange =True, xaxis_categoryorder ='total descending', title ='Frequency of keywords across submissions, by area (excluding 2020)',**aspect(0.5)).show(config=config)k_cnt_new['Submissions_pct'] = k_cnt_new.groupby(['Short Name'])['# Submissions'].transform(lambda x: x/x.sum()*100)k_cnt_new = k_cnt_new.round({'Percentage': 1})# make sure order is consistent over absolute & percentage plotk_cnt_new['Totals']=k_cnt_new.groupby(['Short Name'])['# Submissions'].transform(lambda x: x.sum())k_cnt_new=k_cnt_new.sort_values('Totals', ascending=False)# without 2020 in percentpx.bar(k_cnt_new, x ='Short Name', y ='Submissions_pct', color ='Area', custom_data = ['Area']).update_traces( hovertemplate ='Keyword "%{x}" specified by %{y} submissions from area "%{customdata}"<extra></extra>').update_layout( barmode ='stack', xaxis_dtick =1, xaxis_tickfont_size =8, xaxis_fixedrange =True, yaxis_fixedrange =True, xaxis_categoryorder ='trace', title ='Frequency of keywords across submissions, by area (excluding 2020)', yaxis_title ='% of Submissions',**aspect(0.5)).show(config=config)
How are keywords distributed across areas?
Code
# do a manual histogram to include non-specified keywordsk_cnt = staticdata['keywords'].merge( pd.DataFrame(staticdata['area'].values(), columns = ['Area']), how ='cross').merge( k_all .value_counts(['Short Name', 'Area']) .rename('# Submissions') .reset_index(), how ='outer').fillna(1e-10) # needed for sorting, Plotly bug?px.bar(k_cnt, x ='Short Name', y ='# Submissions', color ='Area', custom_data = ['Area']).update_traces( hovertemplate ='Keyword "%{x}" specified by %{y} submissions from area "%{customdata}"<extra></extra>').update_layout( barmode ='stack', xaxis_dtick =1, xaxis_tickfont_size =8, xaxis_fixedrange =True, yaxis_fixedrange =True, xaxis_categoryorder ='total descending', title ='Frequency of keywords across submissions, by area',**aspect(0.5)).show(config=config)
How many submissions specified a given number of keywords?
Code
tmp = (submissions .value_counts(['# Keywords', 'Area']) .rename('# Submissions') .reset_index())px.bar(tmp, x ='# Keywords', y ='# Submissions', barmode ='stack', color ='Area', custom_data=['Area'],).update_traces( hovertemplate ='%{y} submissions specified %{x} keywords in area "%{customdata}"<extra></extra>',).update_layout( xaxis_dtick =1, title ='Keyword count per submission',**aspect(0.5)).show(config=config)
Does keyword count correlate with decision?
Code
tmp = (submissions .assign(**{'# Keywords': submissions['# Keywords'] .map(lambda x: str(x) if x <10else'≥10') }) .value_counts(['# Keywords', 'Decision']) .groupby(level=0) .apply(group_stat) .droplevel(0) # [jdf] avoid duplicate column .reset_index())px.bar(tmp, x ='# Keywords', y ='# Submissions', barmode ='stack', color ='Decision', custom_data=['Decision', '% Submissions', 'Total'],).update_traces( hovertemplate ='%{y} (%{customdata[1]}%) of %{customdata[2]} submissions with %{x} keywords had decision "%{customdata[0]}"<extra></extra>',).update_layout( xaxis_dtick =1, xaxis_type ='category', xaxis_categoryorder ='category ascending', title ='Decisions by keyword count',**aspect(0.5)).show(config=config)px.bar(tmp, x ='# Keywords', y ='% Submissions', barmode ='stack', color ='Decision', custom_data=['Decision', '# Submissions', 'Total'],).update_traces( hovertemplate ='%{y}% (%{customdata[1]} in total) of %{customdata[2]} submissions with %{x} keywords had decision "%{customdata[0]}"<extra></extra>',).update_layout( xaxis_dtick =1, xaxis_type ='category', xaxis_categoryorder ='category ascending', title ='Decisions by keyword count',**aspect(0.5)).show(config=config)
Do specific keywords correlate with decision?
Code
# do a manual histogram to include non-specified keywordsk_dec = (pd.crosstab(k_all["Short Name"], k_all["FinalDecision"]).stack() ## changed this from value_counts to crosstab, to include counts of 0, which plotly's sorting seems to need to work correctly .groupby(level =0) .apply(group_stat) .droplevel(level=0) .reset_index())k_dec=k_dec.sort_values('Total', ascending=False)px.bar(k_dec, x ='Short Name', y ='# Submissions', color ='FinalDecision', custom_data = ['FinalDecision', '% Submissions', 'Total'],).update_layout( xaxis_dtick =1, xaxis_tickfont_size =8, title ='Decision by presence of keyword',**aspect(0.4), xaxis_categoryorder ='trace').update_traces( hovertemplate ="%{y} of %{customdata[2]} submissions (%{customdata[1]}%) specifying keyword '%{x}' had decision '%{customdata[0]}<extra></extra>").show(config=config)px.bar(k_dec, x ='Short Name', y ='% Submissions', color ='FinalDecision', custom_data = ['FinalDecision', '# Submissions', 'Total'],).update_layout( xaxis_categoryorder ='trace', xaxis_dtick =1, xaxis_tickfont_size =8, xaxis_fixedrange =True, yaxis_fixedrange =True, title ='Decision by presence of keyword',**aspect(0.4),).update_traces( hovertemplate ="%{y}% of %{customdata[2]} submissions (%{customdata[1]} in total) specifying keyword '%{x}' had decision '%{customdata[0]}<extra></extra>").show(config=config)
How often are keywords “esoteric”, i.e. used alone?
tmp = (bids .value_counts(['Reviewer', 'Bid'], sort=False) .rename('# Bids') .reset_index())px.bar(tmp, x ='Reviewer', y ='# Bids', color ='Bid').update_layout( xaxis_type ='category', xaxis_categoryorder ='total descending', xaxis_showticklabels =False,**aspect(0.4)).update_traces( hovertemplate ='Reviewier %{x} made %{y} "%{fullData.name}" bids.<extra></extra>').show(config=config)
How many (positive) bids did each submission receive?
Code
tmp = (bids .value_counts(['Paper ID', 'Bid'], sort=False)#.value_counts(['sid', 'Bid'], sort=False) .rename('# Bids') .reset_index() .loc[lambda x: x.Bid.isin(['want', 'willing'])])px.bar(tmp, x ='Paper ID',#x = 'sid', y ='# Bids', color ='Bid').update_layout( xaxis_type ='category', xaxis_categoryorder ='total descending', xaxis_showticklabels =False, title ='Positive Bids per Paper',**aspect(0.4),).update_traces( hovertemplate ='Paper %{x} received %{y} "%{fullData.name}" bids.<extra></extra>',).show(config=config)
Code
popular =15tmp = (bids .query('Bid in ["want", "willing"]') .value_counts(['Paper ID', 'Bid'], sort=False)# .value_counts(['sid', 'Bid'], sort=False) .unstack() .fillna(0) .groupby(['want', 'willing']) .apply(lambda g: pd.Series({'ids': g.index.values, 'count': g.index.size}), include_groups=False) .reset_index() .assign(popular =lambda df: np.where( df['willing']+df['want']>=popular, "≥ %d"% popular, "< %d"% popular)))px.scatter(tmp, x ='willing', y ='want', size ='count', color ='popular', custom_data = ['count', 'ids'],).update_layout( legend_title ='Total Pos. Bids', title ='Distribution of Positive Bids',**aspect(0.4)).update_traces( hovertemplate ='%{customdata[0]} papers received %{x} "willing" and %{y} "want" bids',).show(config=config)
Does the presence of specific keywords correlate with bidding?
We run a reviewer-independent ridge regression model where the independent variable is the overall reviewer interest, and the dependent variable is the (weighted) presence of a keyword. We measure interest by giving each “willing” or “want” bid a score of 1:
Code
tmp_3 = staticdata['keywords'].copy()tmp_3['ix'] =list(range(len(tmp_3)))tmp_3 = tmp_3[['Short Name', 'ix']]tmp_1 = k_all[['Paper ID', 'Short Name']]#tmp_1 = k_all[['sid', 'Short Name']]tmp_2 = bids[(bids['Bid'] =='willing') | (bids['Bid'] =='want')]df = tmp_1.merge(tmp_3, on="Short Name").merge(tmp_2, on="Paper ID")#df = tmp_1.merge(tmp_3, on="Short Name").merge(tmp_2, on="sid")df['weight'] =2df.loc[df['Bid'] =='willing', 'weight'] =1total_weight = df[['Paper ID', 'ix', 'weight']].groupby(['Paper ID', 'ix']).sum().reset_index()keyword_count = tmp_1.groupby(['Paper ID']).count().reset_index()#total_weight = df[['sid', 'ix', 'weight']].groupby(['sid', 'ix']).sum().reset_index()#keyword_count = tmp_1.groupby(['sid']).count().reset_index()keyword_count['Keyword Weight'] =1.0/keyword_count['Short Name']total_weight = total_weight.merge(keyword_count[['Paper ID', 'Keyword Weight']], on="Paper ID")nrows =max(total_weight['Paper ID']) +1#total_weight = total_weight.merge(keyword_count[['sid', 'Keyword Weight']], on="sid")#nrows = max(total_weight['sid']) + 1ncols =max(total_weight['ix']) +1design_matrix = np.zeros((nrows, ncols))design_matrix.shaperhs = np.zeros(nrows)# this is embarrassing, there must be a fancy pandas way of doing it.# someone else can figure it out.for i, row in total_weight.iterrows(): design_matrix[int(row['Paper ID']), int(row['ix'])] = row['Keyword Weight']#design_matrix[int(row['sid']), int(row['ix'])] = row['Keyword Weight'] rhs[int(row['Paper ID'])] = row['weight']#rhs[int(row['sid'])] = row['weight']import scipy.linalgfrom sklearn.linear_model import Ridge# Ideally, we find the best regularizer by splitting into training/validation,# but on inspection the order doesn't seem to change too much lr = Ridge(1).fit(design_matrix, rhs)lr.coef_tmp_3['Importance'] = lr.coef_tmp_3 = tmp_3.sort_values(by=['Importance']).merge(staticdata['keywords'], on='Short Name', )px.scatter(tmp_3, x="Short Name", y="Importance", color='Category', custom_data = ['Keyword'],).update_layout( title ='Keyword Importance for Bidding', xaxis_dtick =1, xaxis_categoryorder ='trace', xaxis_tickfont_size =8,**aspect(0.4)).update_traces( hovertemplate ='Importance of "%{customdata[0]}": %{y}<extra></extra>').show(config=config)
Assignment
How many papers were PC members assigned?
Code
tmp = assignments.value_counts(['Reviewer']).rename('# Assignments').reset_index()px.histogram(tmp, x ='# Assignments',).update_traces( hovertemplate ='%{y} reviewers were assigned %{x} submissions',).update_layout( bargap =.1, yaxis_title ='# PC members', title ='Distribution of assignments',**aspect(0.4)).show(config=config)### TODO: Split by year tmp2=assignments.merge(submissions['year'], on='Paper ID').value_counts(['Reviewer', 'year']).rename('# Assignments').reset_index()px.histogram(tmp2, x ='# Assignments', facet_row="year", category_orders={'year': [2023, 2022, 2021]}).update_traces( hovertemplate ='%{y} reviewers were assigned %{x} submissions').update_layout( bargap =.1, yaxis1_title ='# PC members', yaxis2_title ='# PC members', yaxis3_title ='# PC members', title ='Distribution of assignments, per year',**aspect(0.5)).show(config=config)
Code
tmp = assignments.value_counts(['Reviewer', 'Role']).reset_index()px.histogram(tmp, x ='Reviewer', color ='Role',).update_traces( hovertemplate ='%{y} reviewers were assigned %{x} submissions as %{fullData.name}<extra></extra>').update_layout( bargap =.1, barmode ='group', xaxis_title ='# Assignments', yaxis_title ='# Members', title ='Distribution of assignments',**aspect(0.4)).show(config=config)### TODO: Split by yeartmp2=assignments.merge(submissions['year'], on='Paper ID').value_counts(['Reviewer', 'Role', 'year']).reset_index()px.histogram(tmp2, x ='Reviewer', color ='Role', facet_row='year', category_orders={'year': [2023, 2022, 2021]}).update_traces( hovertemplate ='%{y} reviewers were assigned %{x} submissions as %{fullData.name}<extra></extra>').update_layout( bargap =.1, barmode ='group', xaxis_title ='# Assignments', yaxis1_title ='# Members', yaxis2_title ='# Members', yaxis3_title ='# Members', title ='Distribution of assignments',**aspect(0.5)).show(config=config)
How many areas did reviewers review in?
Code
tmp = (assignments .merge(submissions, on='Paper ID')#.merge(submissions, on='sid') .groupby('Reviewer') .apply(lambda x: len(x['Area'].unique()), include_groups=False) .reset_index())px.histogram(tmp, x =0,).update_traces( hovertemplate ='%{y} PC members were assigned submissions from %{x} area(s)',).update_layout( bargap =.1, xaxis_title ='# Areas', yaxis_title ='# PC members',**aspect(0.4),).show(config=config)### TODO: Split by yeartmp = (assignments .merge(submissions, on='Paper ID') .groupby(['Reviewer', 'year']) .apply(lambda x: len(x['Area'].unique()), include_groups=False) .reset_index())px.histogram(tmp, x =0, facet_row='year', category_orders={'year': [2023, 2022, 2021]}).update_traces( hovertemplate ='%{y} PC members were assigned submissions from %{x} area(s)',).update_layout( bargap =.1, xaxis_title ='# Areas', yaxis1_title ='# PC members', yaxis2_title ='# PC members', yaxis3_title ='# PC members',**aspect(0.5),).show(config=config)
### TODO: Split by yeartmp = bids.assign( Score = bids.apply(lambda x: (matchscores.loc[x['Reviewer'], x['Paper ID']]), axis=1), Area = bids.apply(lambda x: (submissions.loc[x['Paper ID'], 'Area']), axis=1), year = bids.apply(lambda x: (submissions.loc[x['Paper ID'], 'year']), axis=1), ).query('Score > -1.0')px.box(tmp, x ='Bid', y ='Score', color ='Bid', facet_row='year').update_layout( showlegend =False, xaxis_categoryorder ='array', xaxis_categoryarray = ['want', 'willing', 'reluctant', 'conflict'],**aspect(0.8)).update_traces( line_width =2, boxmean =True).show(config=config)
Code
px.violin(tmp, x ='Bid', y ='Score', color ='Area', box =True,).update_layout(# showlegend = False, title ='Match scores by bid by area', xaxis_categoryorder ='array', xaxis_categoryarray = ['want', 'willing', 'reluctant', 'conflict'], violingap=0.2, violingroupgap=0.1,**aspect(0.4)).update_traces( box_line_color ='black', box_line_width =1, line_width =0, meanline_visible=True, marker_size =4,# boxpoints = 'outliers',).show(config=config)### TODO: Split by yearpx.violin(tmp, x ='Bid', y ='Score', color ='Area', box =True, facet_row='year').update_layout(# showlegend = False, title ='Match scores by bid by area, by year', xaxis_categoryorder ='array', xaxis_categoryarray = ['want', 'willing', 'reluctant', 'conflict'], violingap=0.2, violingroupgap=0.1,**aspect(0.8)).update_traces( box_line_color ='black', box_line_width =1, line_width =0, meanline_visible=True, marker_size =4,# boxpoints = 'outliers',).show(config=config)
How often were reviewers assigned submissions that they bid on?
Code
tmp = ( assignments .merge(bids, on=['Reviewer', 'Paper ID'], how='left')# .merge(bids, on=['Reviewer', 'sid'], how='left') .value_counts(['Role', 'Bid']) .rename('Reviewers') .reset_index())fig = px.bar(tmp, y ='Reviewers', x ='Role', color ='Bid', custom_data = ['Bid']).update_traces( hovertemplate ='%{y} PC members assigned as %{x} bid %{customdata}<extra></extra>',).update_layout( title ="Assignment by bidding",**aspect(0.4),).show(config=config)### TODO: Split by yeartmp2 = ( assignments .merge(bids, on=['Reviewer', 'Paper ID'], how='left') .merge(submissions, on='Paper ID') .value_counts(['Role', 'Bid', 'year']) .rename('Reviewers') .reset_index())fig = px.bar(tmp2, y ='Reviewers', x ='Role', color ='Bid', custom_data = ['Bid'], facet_row='year').update_traces( hovertemplate ='%{y} PC members assigned as %{x} bid %{customdata}<extra></extra>',).update_layout( title ="Assignment by bidding, per year",**aspect(0.5),).show(config=config)